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SUMMARY 

This paper discusses a neural network tool for more effective aircraft design evaluations during wind 
tunnel tests. Using a hybrid neural network optimization method, we have produced fast and reliable 
predictions of aerodynamical coefficients, found optimal flap settings, and flap schedules. For 
validation, the tool was tested on a 55% scale model of the USAF/NAS A Subsonic High Alpha 
Research Concept (SHARC) aircraft. Four different networks were trained to predict coefficients of 
lift, drag, moment of inertia, and lift drag ratio (Cl, Cj> Cm and L/D) from angle of attack and flap 
settings. The latter network was then used to determine an overall optimal flap setting and for finding 
optimal flap schedules. 


INTRODUCTION 

Wind tunnel testing can be slow and costly due to high personnel overhead and intensive power 
utilization. Thus, a method that reduces the time spent in a wind tunnel, as well as the work load 
associated with a test, are of major interest to airframe manufacturers and design engineers. 

Modem wind tunnels have become highly sophisticated test facilities used to measure a number 
of performance features of aircraft designs. In this study we have chosen to consider only 
determination of the coefficient of lift (CJ, coefficient of drag (C D ), pitching moment (C M ) and 
lift/drag ratio (L/D) as functions of angle of attack and flap settings. In this paper we emphasize 
prediction, but the techniques are applicable to other steps of wind tunnel testing as well. 

Currently, a new design test is followed by extensive manual data fitting and analysis. To allow 
researchers to interpolate between measurements, evaluation of the aircraft design is based on visual 
inspection of curves. One way to automate the procedure is to find mathematical expressions to 
describe the complex relationships between variables. Although neural networks are not the only 
approach potentially able to perform this task, e.g., numerical aerodynamic simulations, such soft 
computing methods provide a very cost effective approach. Spin-off benefits can also result from a 
new approach and include increased automation of measurement processing and aids for checking 
earlier calculations. The longer term benefits are a significant reduction in costs and faster 
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development of new aircraft, or alternate tunnel uses such as more aerodynamically efficient 
automotive design. 

This paper is organized as follows: A short introduction to Multilayer Perceptrons (MLP) is given 
and one powerful method we used (a variation on the Levenberg-Marquardt method) is presented. 
Next, we describe how a subset of test measurements were used with the technique to train four 
networks to predict aerodynamical coefficients and the L/D ratio, given angle of attack and flap 
settings. We then present two applications. The first addresses the problem of determining an 
“overall optimal” flap setting using a method based on integration of L/D vs. C L . The second 
demonstrates an easy strategy to find optimal flap schedules. Finally, details of the software tool 
set are given in an appendix as a supplement to documentation in the project code. 

MULTILAYER PERCEPTRONS 

The phrase “neural network” is an umbrella covering a broad variety of different techniques. The 
most commercially used network type is probably the MLP network. See reference 1. An example of 
a MLP network is shown in figure 1. In this study we used a two-layer network with tangent 
hyperbolic activation functions in hidden layer units, and a linear transfer function in the output 
units. A two-layer network is not always an optimal choice of architecture (goodness measured in 
terms of the smallest number of weights required to obtain a given precision), but it is sufficient to 
approximate any continuous function arbitrary well (ref. 2), and training is easier to implement and 
faster in this case. 

A MLP network is a special type of an “all-purpose” function, which in many recent situations has 
shown an excellent ability for function approximation (ref. 3). The network shown in figure 1 
corresponds to the following functional form 



Figure 1. A three input, two output, two layer MLP network. The weights from the inputs set to 1 
represent the biases. Here f t (x) = tanh(x) and F ■ (x) = x. 



A special feature offered by this type of network is that it can be trained to approximate many 
functions well, without requiring an extravagant amount of parameters (weights). This is discussed 
in Sjoberg, et.al., (ref. 4). One disadvantage compared to other network types is that training is slow 
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because the network implements a non-linear regression, i.e., there is a nonlinear relation between 
the adjustable parameters, the weights, and the output. 

In this study, we were interested in enhancing generic neural networks for wind tunnel test 
estimation. Obtaining network training data is very costly, e.g., $3,000 dollars per tunnel hour for 
the National Full-Scale Aerodynamic Complex at Ames Research Center. Consequently, only limited 
data sets were available and usually as byproducts of previously scheduled tests. The size of the data 
set imposes an upper limit on how many weights the networks should contain. In practice, since 
there is also uncertainty associated with the measurements, the number of data points must exceed 
the number of weights by a sufficiently large factor to ensure that good generalization may be 
achieved. Training time, on the other hand, is not of prime importance. Many arguments can be 
made in favor of some form of MLP networks as the right choice for the given problem. The real 
problem is obtaining extremely high accuracies critical for commercial viability. 

TRAINING 

The training phase is the process of determining network weights from a collected set of 
measurement data. The treatment of different aerodynamical coefficients is essentially identical, so 
we will consider a generic quantity 'y' instead. If the aircraft flaps are coupled, y becomes a function 
of three different variables: Angle of attack ( a ), leading edge flap angle (LE), and trailing edge flap 
angle (TE). 


y=g Q (<p) 


( 2 ) 


where 


<p=[« LE TE] t ( 3 ) 

The function 'g' is unknown, but the wind tunnel tests provide us with a set of corresponding y-q> 
pairs 


Z N ={[<p‘,y},i = l...,N} (4) 

Naturally the measurements of y are not exact, but will be influenced in undesired ways from a 
number of different sources. All measurement errors are grouped in one additive noise term, e 

y-go( ( p) +e 

The objective is now to train the neural network to predict y from 

y = g(q>) (6) 

The predictor is found from the set of measurements, Z N , from here on denoted the training set. 
Expressed precisely, we wish to determine a mapping from the set of measurements to the set of 
functions contained in the chosen network architecture g((p; 0) 

Z N ->0 (7) 
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so that y is close to the “true” y. 0 is the parameter vector containing all adjustable parameters (in 
this case, the network weights). 

A common definition for goodness of fit in neural nets is the mean square error 

=dyL ( «w (8) 

Thus, training becomes a conventional unconstrained optimization problem. For various reasons 
back-propagation, a flexible but somewhat ad hoc gradient search method, has been the preferred 
training algorithm in the neural network community. Ease of implementation, utilization of the 
inherent parallel structure, and the ability to work on large data sets are the main arguments justifying 
the use of this method. However, in the present case where the data sets are of limited size, back- 
propagation is not the best choice. Instead we have decided to use the so-called Levenberg- 
Marquardt method for solving the optimization, since like conjugate gradient approaches it is in 
many ways superior to back-propagation as well as most other gradient search methods. The 
Levenberg-Marquardt method, independent of its neural implementation, is a work horse of many 
optimization packages (ref. 5). Some important advantages of the method are speed, guaranteed 
convergence to a (local) minima, numerical robustness, and minimal user-specified inputs are 
necessary except for providing a network architecture. Moreover, as pointed out in More (ref. 6) the 
method is surprisingly free of ad hoc solutions to achieve these benefits. Such advantages are 
important properties in making a user-friendly, easy-to-apply tool, which is crucial in this case since 
our objective was to create a generic methodology for application use and determine if in fact neural 
networks were capable of performing the complex mappings required in nonlinear aero design. 

The Levenberg-Marquardt method has numerous variations. The simplest strategy may be found in 
the original contribution of Marquardt, while one adaptation to neural network training is discussed 
in reference 7. The version used here belongs to the class of trust region methods found in Fletcher 
(ref. 8). Just as back propagation, the Levenberg-Marquardt algorithm is an iterative search scheme 

#* + i>= & k) + ^ k) h (k) (9) 

From the current iterate a new iterate is found by moving a step of size ji in direction h 
There exist several methods that fit into this structure. Their differences lay mainly in the way that a 
search direction is determined. In back propagation, a search direction is chosen as the gradient of 
the cost function evaluated at the current iterate 


h w = G(& k) ) = V\6 ik) ) = 


dV(6) 


de 


9=ef k) 


( 10 ) 


while the step size can be either constant or vary according to some adaptive scheme. For a given 
cost function, the gradient is determined by 




de 


im -L £*<». 


de 


C(0) = -^Xi/ ( 0)£'(0) 

A I=1 


(ID 
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y/(0) denotes the gradient of the network output w.r.t. each of the weights, when the input is <pf 




de 


( 12 ) 


Another alternative method is the Gauss-Newton algorithm. Although convergence is not guaranteed 
and it suffers from severe numerical ill-conditioning, it provides an important basis for the 
Levenberg-Marquardt method. The idea is at each iteration to minimize a cost function based on a 
linear approximation of the prediction error 


l( 0) = — Xfe^Vlv'W^V-O) 2 

2 N ;_1 ' 


(13) 


It is shown (ref. 8) that this approach will fit into the basic scheme by setting the step size to 1, and 
the search direction to 


h w = [ R( & k) )] _1 G(& k) ) (14) 

R represents the “Gauss-Newton” approximation to the Hessian (V"(6)) and is defined by 

m = < 15 ) 

Ntt 

In the Levenberg-Marquardt method however, the search direction is found as an “intelligent” 
interpolation between the two previously mentioned directions as follows: 

1) Create an initial parameter vector & 0) and an initial value A <p) 

2) Determine the search direction, h, by solving the following system of equations 

[R(0 (fc) ) + A w ]/i w = ’-^ ) ) 

3) Evaluate network and determine V(0 W + h (k) ) as well as the “prediction” of the error 
V(9 {k) )- L(& k) +h ik) ) 

i ( « 

4) V(^)-V(^4/i c<) )>O.75[y(0 :4) )-L(^ , +/i ( ^ => X ik+1) = ^~ 

5) V(^)-V(^4/z^)<0.25[y(^ ) )-^^ ) +^ ) )] => X ik+l) = 2X ik) 

6) If V N (0 (k) +h ik) )<V N (& k) ) then accept & k+l) = & k) + h {k) as anew iterate 

7) If the stop criterion is not satisfied set k=k+l and go to 2). Otherwise set 0= Q [k) and 

terminate 


Clearly, if A is too large, the diagonal matrix will overwhelm the Hessian (R) in 2). The effect of 
this is a search direction approaching the gradient direction, but with a step size close to 0. This is 
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important to ensure convergence, since the cost function always may be minimized by taking small 
enough steps in the direction of the gradient. On the other hand, if X equals zero, the method will 
coincide with the Gauss-Newton method. What then is a sensible strategy for adjustment of X ? The 
choice is basically to decrease X if the approximation of error is reasonable and vice versa if it is 
not. This is just what is tested in steps 4) and 5) of the above algorithm. If a new iterate leads to a 
decrease in cost close to what is predicted using L( 6 ) , A is reduced. Since the right hand sides of 4) 
and 5) always are positive, X always is increased until a decrease in cost is obtained. 

From Madsen (ref. 9) it follows that 

[v(e (t) ) - L(e m + fc (t) )] = g(0“>) + ( 1 7 ) 

which can be easily computed for use in step 3) and 4) of the procedure. 

Some typical termination criteria for use in step 7) are: 


a. The gradient is sufficiently close to zero. 

b. The cost function is below a certain value. 

c. A maximum number of iterations is reached. 

d. X exceeds a certain value. 

Since the cost function will often have a number of local minima, it is important to run the training 
algorithm multiple times, starting from different sets of initial weights. The set of trained weights 
that leads to the lowest minima is then chosen. 

GENERALIZATION 

Before applying the above algorithm, a few comments should be made regarding mean square error 
cost functions. Actually, the mean square error criterion is not really what we are most interested 
in minimizing in this particular problem. This is especially so because the measurements available are 
corrupted by noise. A far better measure of fit is the mean square error at “all possible” 
measurements. This is what is known as the generalization error, the ability to predict new 
measurements not seen during the training phase. 

V(S) = £:{v(e)} ( 18 ) 

Besides being unrealistic to compute in practice, this quantity does not give information about 
whether the selected network architecture is a good choice. 6 depends on the training set and is 
thereby a stochastic variable, which in turn means that V(6) is a stochastic variable. Taking the 
expectation of V(6) with respect to all possible training sets of size N, yields 

J(M) = e{v(0)} (19) 
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which is called the average generalization error or model quality measure (ref. 10). Assuming the 
existence of set of “true” weights, 0 O , allowing us to exactly describe the data-generating function by 
the network architecture (g(<p; Q 0 ) = g 0 ((p)), and assuming the noise contribution is white noise 
independent of the inputs, an estimate of the average generalization error may be achieved. This 
estimate is known as Akaike's final prediction error (FPE) estimate. See ref. 10 for a derivation. 


~ N+d 

= (20) 

N~d M 

d M is the total number of weights in the network, while N as before is the size of the training set. 

The more we increase the size of the network, the more flexibility we add and the better we are able 
to fit the training data. In other words, V(6) is a decreasing function of the number of weights in the 
network. If too much flexibility is added, one may expect that both the essential properties of the 
training set are captured and unfortunately, the properties of the particular noise sequence present in 
the data. This is commonly known as over fitting, and is exactly the dilemma Akaike’s FPE 
expresses. There exist two usual approaches to deal with this problem. One is to find an 'optimal' 
network architecture (an architecture that minimizes J). The most successful strategy developed so 
far is pruning (see ref. 11). However, in order for it to be applicable, the data set should not be too 
limited compared to the required network size. 

A second approach is to introduce a simple yet powerful extension to the cost function, called 
Regularization (or weight decay). Given the previously mentioned assumptions, it is a known result 
that the least squares estimate is unbiased. But unfortunately 


E 



+ — tr(R~ l ) 
N 


( 21 ) 


In other words, when minimizing the mean square error the weight estimates tend to be exaggerated. 
Imposing a punishment for this tendency in the cost function is called (simple) Regularization 


= ( 22 ) 

The simple Regularization approach leads to biased weights, since the weights are pulled towards 0. 
But by choosing the scalar sufficiently small, it can be shown that the average generalization error 
will decrease. See Sjoberg and Ljung (ref. 12) for a proof. Finding the optimal value is not a trivial 
task. More detailed discussions may be found in references 13 and 14. A rule of thumb is that a little 
Regularization usually helps. Other nice properties about Regularization are that it significantly 
reduces the number of local minima, as well as the number of required iterations to find a minimum. 
The Regularization extension to the cost function clearly influences the training algorithm, but 
incorporation is quite straightforward. In algorithm (16), we make the following changes: 


The gradient of W becomes 

1 ( N \ 

G{6) = W'(Q) = Y v (0)£* (0) + Se 

) 


(23) 
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Also the Hessian is changed, which in turn changes the expression for determination of the search 
direction 


R(d (k) ) + (— + X ik} )I 
N 


h m = G( 0W) 


(*)' 


(24) 


Since this expression from a numerical conditioning standpoint is the weak link of the training 
algorithm, it should be noticed that a spin-off from regularization is a robustness increasing effect on 
training, since the matrix (the term in brackets) will be moved further away from singularity. This is 
performed as follows: 


Substitute V for W in steps 3 to 6 of (16). 


Also L is changed to: 


leading to 


L(6) = L(6) + — \\Q\\ 


[V(0 {k) )-L(9 ik) 


+ 



((h w ) T G(d {i) ) + 


V 


(l w + ^) 
N 



(25) 


(26) 


APPLYING NETWORKS TO THE MEASUREMENTS 

The aircraft used for method validation was a 55% scale model of the SHARC aircraft. See 
Picture 1. 


Picture 1: SHARC aircraft mounted in the 40 x 80 ft. wind tunnel. 

The test was conducted in the 40 x 80 ft. wind tunnel at NASA Ames Research Center part of the 
National Full-Scale Aerodynamics Complex shown in Picture 2. 
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Picture 2: National Full-Scale Aerodynamics Complex. 


The control surfaces used in this study included leading edge and trailing edge flaps. During tests the 
left and right controllers for each flap were coupled. Each flap has six different set angles, thus 
thirty-six flap combinations were possible. The test involved picking a flap combination and varying 
angle of attack over a prespecified range, while measuring lift, drag, and pitching moment. Twenty- 
three of the total of thirty-six possible flap combinations were examined in the test, and the 
measurements were taken at sixteen different angles of attack (alpha) values for each combination. 
Table 1 presents the organization of the full data set and how the twenty-three subsets were separated 
into training and test sets, respectively. 


Table 1 . Organization of the data set for the 10% scale model test. “Train” indicates that the 
subset was included in the training set, similarly for test . The entire data set consists of 
measurements from 348 different alpha/LE/TE combinations. 
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Despite the possible differences in complexity, four identical network architectures were chosen to 
predict C L , C D , C M andL/D. 
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one should test different architectures (or vary the regularization parameter) and pick one that leads to 
a good compromise between having predictions close to the actual measurements and having the 
intermediate predictions following smooth curves. Typical results obtained from applying the trained 
networks to one of the six test sets are shown in figure 4. 

Coefficient of lift vs. angle of attack Coefficient of drag vs. angle of attack 



Figure 4. Comparison of test data and network predictions for LE=0.50 and TE=0.0. 

It is difficult to come up with a good validity criterion in terms of an RMS value or a similar 
measure for this problem. Basically the validation was done by visual inspection of plots like those 
of figure 4. The predictions appear to be very close to the actual measurements, and taking the 
uncertainty of the measurements into consideration, the predictions are definitely considered to be 
satisfactory. L/D is harder to model than the 3 aerodynamical coefficients. An alternative way of 
predicting this ratio is to divide the predictions of C L and C D 

UD=C l /C d (27) 

Unfortunately this strategy is very sensitive to prediction errors for small values of C D , and 
compared to training on L/D directly, the performance was very poor. Notice that L/D is plotted 
versus C L in figure 4. The reason why this figure is of particular interest will be explained in the 
following section. 
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ADVANCED APPLICATIONS 

Because the neural networks have provided models that capture the relations between inputs and 
outputs, other utilization beyond new point estimation is straightforward. Two applications are 
considered here, both dealing with the problem of finding flap settings that ensure high 
maneuverability. 

Overall flap setting 

If the plane is flown with minimal change in flap settings, it is desirable to keep the flaps in a 
position that will ensure a high L/D over the topical flight envelope. In the current case this is 
interpreted in terms of a performance index we want maximized. The criterion is the area below the 
L/D vs. C L curve in the C L range [0.15, 0.55], as illustrated in figure 5. 

J(LE,TE) = f° 55 Li D(C L ,LE,TE)dC L (28) 

•' 0.15 
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Figure 5. A rough sketch of the principle. 

Basically the entire surface (J) is useful, but we are particularly interested in the maximum point 

J - arg max J{LE , TE) (29) 

{le,te} 

To find the areas (J), we need a way to express the L/D ratio as a function of C L . To obtain this, a 
network is trained as the “inverse” function 

a = C L ~ ] (C L , LE, TE) (30) 

Since for fixed flap positions, C L vs. oc is an almost straight line over the necessary range of oc, 
modeling the inverse function is not harder than modeling the actual function. In general, C L vs. cc 
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need not be one-to-one in the measured range, and if that is the case, one has to be extra careful 
about the training. The inverse function network should only be trained in the a range where the 
function actually is one-to-one. By using this new network in front of the L/D-network, we get a 
predictor for L/D which depends on LE, TE, and C L . 

Alternatively, a network L/D (C l ,LE, TE) might be trained directly, but for some reason this didn't 
seem to give quite as good results. The performance criterion (J) was then evaluated for a large 
number of flap combinations by applying numerical integration. The integration was carried out 
using the trapezoidal rule 

f " f(x)dx « h[jf(x l ) + f(x 2 ) + --- + f(x N _ l ) + f(x N )] (31) 

Jx l 

J was evaluated at 961 points (each flap was set in 3 1 different positions within their respective 
ranges). For each combination L/D was evaluated at 101 different values of C L in the interval 
[0.15;0.55]. The result is shown below (figs. 6 and 7), using three different representations. 



Figure 6. Surface plot of the integrated L/D for flap combinations of the 55% model. 



Figure 7. (a) The surface plot projected onto the TE-LE plane, (b) As a contour plot. 
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Flap schedules 


To ensure a high maneuverability at all times (i.e., for all angles of attack) it is desirable to constantly 
position the flaps so that maximum L/D is obtained. Doing this is straightforward. For a particular 
choice of alpha, the L/D-network is evaluated at a number of different flap combinations, and the 
combination leading to maximum L/D is stored. This is then repeated for a large number of alpha 
values. The result of this procedure is shown in figure 8. 


Flap angle vs. alpha (LE = solid. TE = dashed) 



Figure 8. The schedule for each of the two flaps. For 101 different values of angle of attack 

o A 

maximum L/D was searched for among 31*31 flap combinations. 

CONCLUSIONS AND FUTURE DIRECTIONS 

The purpose of this study was threefold: To test the neural net method, to produce cost savings by 
minimizing the number of required wind tunnel measurements, and to automate follow-on data 
processing. We are very encouraged by the results obtained so far. In general, the predictions 
provided by networks trained on less than 74% of the original set of measurements are considered to 
lay within the tolerance ranges. Moreover, the peak in figure 6 matched closely the one found by the 
conventional procedure. In December 1994, the tool was applied during test of a newly generated 
50% scale model of the SHARC. Again with impressive results, indicating approximately 40% 
savings (in this case certain alpha vales instead of certain flap combinations were discarded). A 
substantial spin-off from the study has been to provide plots like the ones shown in figures 6-8. By 
following conventional procedures, generating this type of plot is seldom realistic from a cost and 
resource viewpoint. The current method is very rapid and extremely cost effective for this task. 

We believe that neural networks will become an important tool in future NASA Ames efforts to 
move directly from wind tunnel tests to virtual reality simulations of actual full scale aircraft flight 
behavior. Being able to fly the plane at such an early stage would be a tremendous help to flight 
engineers. Many bugs could be eliminated at a much earlier stage, which in turn would significantly 
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cut expenses, as well as time spent on development. Preliminary simulations in our laboratory have 
already demonstrated the principle. Efforts are now underway to explore the application of this 
technique to the estimation of hypersonic flight performance. 
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